AITopics | normalization term

Collaborating Authors

normalization term

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Fisher Efficient Inference of Intractable Models

Song Liu, Takafumi Kanamori, Wittawat Jitkrittum, Yu Chen

Neural Information Processing SystemsFeb-14-2026, 00:25:15 GMT

We prove its consistency and show that the asymptotic variance ofitssolution canattaintheequality oftheefficiencybound undermild regularity conditions.

artificial intelligence, machine learning, normalization term, (14 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Fisher Efficient Inference of Intractable Models

Song Liu, Takafumi Kanamori, Wittawat Jitkrittum, Yu Chen

Neural Information Processing SystemsAug-20-2025, 01:27:12 GMT

Neural Information Processing Systems http://nips.cc/

estimation, normalization term, stein feature, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California (0.04)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)

Add feedback

Understanding Transformer from the Perspective of Associative Memory

Zhong, Shu, Xu, Mingyu, Ao, Tenglong, Shi, Guang

arXiv.org Artificial IntelligenceMay-27-2025

In this paper, we share our reflections and insights on understanding Transformer architectures through the lens of associative memory--a classic psychological concept inspired by human cognition. We start with the basics of associative memory (think simple linear attention) and then dive into two dimensions: Memory Capacity: How much can a Transformer really remember, and how well? We introduce retrieval SNR to measure this and use a kernel perspective to mathematically reveal why Softmax Attention is so effective. We also show how FFNs can be seen as a type of associative memory, leading to insights on their design and potential improvements. Memory Update: How do these memories learn and evolve? We present a unified framework for understanding how different Transformer variants (like DeltaNet and Softmax Attention) update their "knowledge base". This leads us to tackle two provocative questions: 1. Are Transformers fundamentally limited in what they can express, and can we break these barriers? 2. If a Transformer had infinite context, would it become infinitely intelligent? We want to demystify Transformer architecture, offering a clearer understanding of existing designs. This exploration aims to provide fresh insights and spark new avenues for Transformer innovation.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2505.19488

Genre: Research Report (0.82)

Industry: Health & Medicine (0.92)

Technology:

Information Technology > Artificial Intelligence > Systems & Languages > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
(2 more...)

Add feedback

8b5040a8a5baf3e0e67386c2e3a9b903-Reviews.html

Neural Information Processing SystemsMar-13-2024, 18:22:03 GMT

Summary: This paper addresses the problem of conditional density estimation with a high dimensional input space (p n), an important problems as most (if not all) current models for nonparametric conditional density estimation do not scale to high-dimensions. Moreover, datasets with high dimensional inputs but relatively small sample sizes are becoming increasingly common. The model for the conditional density f(y x) is defined in three stages. First, a tree structure is defined over the input space. Second, given the tree structure, C_{j,k}, the k th partition of the X space at scale j, is mapped to a lower dimensional space.

conditional density, conditional density estimation, tree structure, (14 more...)

Neural Information Processing Systems

Genre: Research Report (0.57)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.61)

Add feedback

Estimating Density Models with Complex Truncation Boundaries

Liu, Song, Kanamori, Takafumi

arXiv.org Machine LearningOct-9-2019

Truncated densities are probability density functions defined on truncated input domains. These densities share the same parametric form with their non-truncated counterparts up to a normalization term. However, normalization terms usually cannot be obtained in closed form for these distributions, due to complicated truncation domains. Score Matching is a powerful tool for fitting parameters in unnormalized models. However, it cannot be straightforwardly applied here as boundary conditions used to derive a tractable objective are usually not satisfied by truncated distributions. In this paper, we propose a maximally weighted Score Matching objective function which takes the geometry of the truncation boundary into account when fitting unnor-malized density models. We show the weighting function that maximizes the objective function can be constructed easily and the boundary conditions for deriving a tradable objective are satisfied.

boundary, estimator, objective function, (16 more...)

arXiv.org Machine Learning

1910.03834

Country:

North America > United States > Illinois > Cook County > Chicago (0.06)
Asia > Middle East > Jordan (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Add feedback

Structure Learning of Partitioned Markov Networks

Liu, Song, Suzuki, Taiji, Sugiyama, Masashi, Fukumizu, Kenji

arXiv.org Machine LearningMay-26-2016

We learn the structure of a Markov Network between two groups of random variables from joint observations. Since modelling and learning the full MN structure may be hard, learning the links between two groups directly may be a preferable option. We introduce a novel concept called the \emph{partitioned ratio} whose factorization directly associates with the Markovian properties of random variables across two groups. A simple one-shot convex optimization procedure is proposed for learning the \emph{sparse} factorizations of the partitioned ratio and it is theoretically guaranteed to recover the correct inter-group structure under mild conditions. The performance of the proposed method is experimentally compared with the state of the art MN structure learning methods using ROC curves. Real applications on analyzing bipartisanship in US congress and pairwise DNA/time-series alignments are also reported.

health & medicine, random variable, us government, (21 more...)

arXiv.org Machine Learning

1504.00624

Country:

North America > United States (1.00)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
Government > Regional Government > North America Government > United States Government (0.48)
Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.61)

Add feedback

Support Consistency of Direct Sparse-Change Learning in Markov Networks

Liu, Song (Tokyo Institute of Technology, Japan) | Suzuki, Taiji (Tokyo Institute of Technology, Japan) | Sugiyama, Masashi (University of Tokyo, Japan)

AAAI ConferencesMar-6-2015

We study the problem of learning sparse structure changes between two Markov networks P and Q. Rather than fitting two Markov networks separately to two sets of data and figuring out their differences, a recent work proposed to learn changes directly via estimating the ratio between two Markov network models. Such a direct approach was demonstrated to perform excellently in experiments, although its theoretical properties remained unexplored. In this paper, we give sufficient conditions for successful change detection with respect to the sample size np, nq, the dimension of data m, and the number of changed edges d.

artificial intelligence, change detection, machine learning, (15 more...)

AAAI Conferences

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > New York (0.04)

Genre: Research Report (0.47)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
Health & Medicine > Therapeutic Area > Immunology (0.46)
Health & Medicine > Epidemiology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Group Sparse Priors for Covariance Estimation

Marlin, Benjamin, Schmidt, Mark, Murphy, Kevin

arXiv.org Machine LearningMay-9-2012

Recently it has become popular to learn sparse Gaussian graphical models (GGMs) by imposing l1 or group l1,2 penalties on the elements of the precision matrix. Thispenalized likelihood approach results in a tractable convex optimization problem. In this paper, we reinterpret these results as performing MAP estimation under a novel prior which we call the group l1 and l1,2 positivedefinite matrix distributions. This enables us to build a hierarchical model in which the l1 regularization terms vary depending on which group the entries are assigned to, which in turn allows us to learn block structured sparse GGMs with unknown group assignments. Exact inference in this hierarchical model is intractable, due to the need to compute the normalization constant of these matrix distributions. However, we derive upper bounds on the partition functions, which lets us use fast variational inference (optimizing a lower bound on the joint posterior). We show that on two real world data sets (motion capture and financial data), our method which infers the block structure outperforms a method that uses a fixed block structure, which in turn outperforms baseline methods that ignore block structure.

artificial intelligence, machine learning, optimization problem, (19 more...)

arXiv.org Machine Learning

1205.2626

Country: North America > Canada (0.28)

Genre: Research Report (0.64)

Industry: Banking & Finance > Trading (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)

Add feedback